98 research outputs found

    VST++: Efficient and Stronger Visual Saliency Transformer

    Full text link
    While previous CNN-based models have exhibited promising results for salient object detection (SOD), their ability to explore global long-range dependencies is restricted. Our previous work, the Visual Saliency Transformer (VST), addressed this constraint from a transformer-based sequence-to-sequence perspective, to unify RGB and RGB-D SOD. In VST, we developed a multi-task transformer decoder that concurrently predicts saliency and boundary outcomes in a pure transformer architecture. Moreover, we introduced a novel token upsampling method called reverse T2T for predicting a high-resolution saliency map effortlessly within transformer-based structures. Building upon the VST model, we further propose an efficient and stronger VST version in this work, i.e. VST++. To mitigate the computational costs of the VST model, we propose a Select-Integrate Attention (SIA) module, partitioning foreground into fine-grained segments and aggregating background information into a single coarse-grained token. To incorporate 3D depth information with low cost, we design a novel depth position encoding method tailored for depth maps. Furthermore, we introduce a token-supervised prediction loss to provide straightforward guidance for the task-related tokens. We evaluate our VST++ model across various transformer-based backbones on RGB, RGB-D, and RGB-T SOD benchmark datasets. Experimental results show that our model outperforms existing methods while achieving a 25% reduction in computational costs without significant performance compromise. The demonstrated strong ability for generalization, enhanced performance, and heightened efficiency of our VST++ model highlight its potential

    SAMN: A Sample Attention Memory Network Combining SVM and NN in One Architecture

    Full text link
    Support vector machine (SVM) and neural networks (NN) have strong complementarity. SVM focuses on the inner operation among samples while NN focuses on the operation among the features within samples. Thus, it is promising and attractive to combine SVM and NN, as it may provide a more powerful function than SVM or NN alone. However, current work on combining them lacks true integration. To address this, we propose a sample attention memory network (SAMN) that effectively combines SVM and NN by incorporating sample attention module, class prototypes, and memory block to NN. SVM can be viewed as a sample attention machine. It allows us to add a sample attention module to NN to implement the main function of SVM. Class prototypes are representatives of all classes, which can be viewed as alternatives to support vectors. The memory block is used for the storage and update of class prototypes. Class prototypes and memory block effectively reduce the computational cost of sample attention and make SAMN suitable for multi-classification tasks. Extensive experiments show that SAMN achieves better classification performance than single SVM or single NN with similar parameter sizes, as well as the previous best model for combining SVM and NN. The sample attention mechanism is a flexible module that can be easily deepened and incorporated into neural networks that require it

    MaxMin-L2-SVC-NCH: A New Method to Train Support Vector Classifier with the Selection of Model's Parameters

    Full text link
    The selection of model's parameters plays an important role in the application of support vector classification (SVC). The commonly used method of selecting model's parameters is the k-fold cross validation with grid search (CV). It is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new method is proposed to train SVC with the selection of model's parameters. Firstly, training SVC with the selection of model's parameters is modeled as a minimax optimization problem (MaxMin-L2-SVC-NCH), in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal model's parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is abandoned. A gradient-based algorithm is then proposed to solve MaxMin-L2-SVC-NCH, in which L2-SVC-NCH is solved by a projected gradient algorithm (PGA) while the maximization problem is solved by a gradient ascent algorithm with dynamic learning rate. To demonstrate the advantages of the PGA in solving L2-SVC-NCH, we carry out a comparison of the PGA and the famous sequential minimal optimization (SMO) algorithm after a SMO algorithm and some KKT conditions for L2-SVC-NCH are provided. It is revealed that the SMO algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility. The comparative experiments between MaxMin-L2-SVC-NCH and the classical parameter selection models on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained and the test accuracy is not lost to the classical models. It indicates that MaxMin-L2-SVC-NCH performs better than the other models. We strongly recommend MaxMin-L2-SVC-NCH as a preferred model for SVC task

    YOLOv5-TS: Detecting traffic signs in real-time

    Get PDF
    Traffic sign detection plays a vital role in assisted driving and automatic driving. YOLOv5, as a one-stage object detection solution, is very suitable for Traffic sign detection. However, it suffers from the problem of false detection and missed detection of small objects. To address this issue, we have made improvements to YOLOv5 and subsequently introduced YOLOv5-TS in this work. In YOLOv5-TS, a spatial pyramid with depth-wise convolution is proposed by replacing maximum pooling operations in spatial pyramid pooling with depth-wise convolutions. It is applied to the backbone to extract multi-scale features at the same time prevent feature loss. A Multiple Feature Fusion module is proposed to fuse multi-scale feature maps multiple times with the purpose of enhancing both the semantic expression ability and the detail expression ability of feature maps. To improve the accuracy in detecting small even extra small objects, a specialized detection layer is introduced by utilizing the highest-resolution feature map. Besides, a new method based on k-means++ is proposed to generate stable anchor boxes. The experiments on the data set verify the usefulness and effectiveness of our work

    Zero-Shot Rumor Detection with Propagation Structure via Prompt Learning

    Full text link
    The spread of rumors along with breaking events seriously hinders the truth in the era of social media. Previous studies reveal that due to the lack of annotated resources, rumors presented in minority languages are hard to be detected. Furthermore, the unforeseen breaking events not involved in yesterday's news exacerbate the scarcity of data resources. In this work, we propose a novel zero-shot framework based on prompt learning to detect rumors falling in different domains or presented in different languages. More specifically, we firstly represent rumor circulated on social media as diverse propagation threads, then design a hierarchical prompt encoding mechanism to learn language-agnostic contextual representations for both prompts and rumor data. To further enhance domain adaptation, we model the domain-invariant structural features from the propagation threads, to incorporate structural position representations of influential community response. In addition, a new virtual response augmentation method is used to improve model training. Extensive experiments conducted on three real-world datasets demonstrate that our proposed model achieves much better performance than state-of-the-art methods and exhibits a superior capacity for detecting rumors at early stages.Comment: AAAI 202

    Scalable mode division multiplexed transmission over a 10-km ring-core fiber using high-order orbital angular momentum modes

    Get PDF
    We propose and demonstrate a scalable mode division multiplexing scheme based on orbital angular momentum modes in ring core fibers. In this scheme, the high-order mode groups of a ring core fiber are sufficiently de-coupled by the large differential effective refractive index so that multiple-input multiple-output (MIMO) equalization is only used for crosstalk equalization within each mode group. We design and fabricate a graded-index ring core fiber that supports 5 mode groups with low inter-mode-group coupling, small intra-mode-group differential group delay, and small group velocity dispersion slope over the C-band for the high-order mode groups. We implement a two-dimensional wavelength- and mode-division multiplexed transmission experiment involving 10 wavelengths and 2 mode groups each with 4 OAM modes, transmitting 32 GBaud Nyquist QPSK signals over all 80 channels. An aggregate capacity of 5.12 Tb/s and an overall spectral efficiency of 9 bit/s/Hz over 10 km are realized, only using modular 4x4 MIMO processing with 15 taps to recover signals from the intra-mode-group mode coupling. Given the fixed number of modes in each mode group and the low inter-mode-group coupling in ring core fibres, our scheme strikes a balance in the trade-off between system capacity and digital signal processing complexity, and therefore has good potential for capacity upscaling at an expense of only modularly increasing the number of mode-groups with fixed-size (4x4) MIMO blocks

    The ALMA-QUARKS survey: -- I. Survey description and data reduction

    Full text link
    This paper presents an overview of the QUARKS survey, which stands for `Querying Underlying mechanisms of massive star formation with ALMA-Resolved gas Kinematics and Structures'. The QUARKS survey is observing 139 massive clumps covered by 156 pointings at ALMA Band 6 (λ∼\lambda\sim 1.3 mm). In conjunction with data obtained from the ALMA-ATOMS survey at Band 3 (λ∼\lambda\sim 3 mm), QUARKS aims to carry out an unbiased statistical investigation of massive star formation process within protoclusters down to a scale of 1000 au. This overview paper describes the observations and data reduction of the QUARKS survey, and gives a first look at an exemplar source, the mini-starburst Sgr B2(M). The wide-bandwidth (7.5 GHz) and high-angular-resolution (~0.3 arcsec) observations of the QUARKS survey allow to resolve much more compact cores than could be done by the ATOMS survey, and to detect previously unrevealed fainter filamentary structures. The spectral windows cover transitions of species including CO, SO, N2_2D+^+, SiO, H30α_{30}\alpha, H2_2CO, CH3_3CN and many other complex organic molecules, tracing gas components with different temperatures and spatial extents. QUARKS aims to deepen our understanding of several scientific topics of massive star formation, such as the mass transport within protoclusters by (hub-)filamentary structures, the existence of massive starless cores, the physical and chemical properties of dense cores within protoclusters, and the feedback from already formed high-mass young protostars.Comment: 9 figures, 4 tables, accepted by RA
    • …
    corecore